[QNN][TFLite] Parsing TFLite quantized models. #3900

anijain2305 · 2019-09-05T18:48:21Z

As of now, QNN dialect has enough QNN ops for Inception model. This PR takes the TFLite Pre-quantized model and parses it for a handful of QNN operators.

@jackwish @FrozenGene @apivovarov @zhiics @shoubhik

anijain2305 · 2019-09-05T21:36:40Z

python/tvm/relay/frontend/tflite.py

@@ -1030,4 +1153,6 @@ def from_tflite(model, shape_dict, dtype_dict):
    outputs = [exp_tab.get_expr(get_tensor_name(subgraph, i)) for i in model_outputs]
    outputs = outputs[0] if len(outputs) == 1 else _expr.Tuple(outputs)
    func = _expr.Function(analysis.free_vars(outputs), outputs)
-    return _module.Module.from_expr(func), params
+    mod = _module.Module.from_expr(func)
+    mod = tvm.relay.qnn.transform.CanonicalizeOps()(mod)


@zhiics @vinx13 @tqchen Should we remove this from here and make it a part of build_module.cc?

shoubhik · 2019-09-09T21:20:01Z

tests/python/frontend/tflite/test_forward.py

@@ -848,6 +848,25 @@ def test_forward_inception_v4_net():
    tvm.testing.assert_allclose(np.squeeze(tvm_output[0]), np.squeeze(tflite_output[0]),
                                rtol=1e-5, atol=1e-5)

+def test_forward_quantized_inception_v1_net():


can we rename it to test_forward_qnn_inception_v1_net

python/tvm/relay/frontend/tflite.py

mshawcroft · 2019-09-30T09:47:54Z

Looks like the latest version of this PR is dependent on #4033 for the renaming of qnn.quantized_dense to qnn.dense.

python/tvm/relay/frontend/tflite.py

anijain2305 · 2019-09-30T17:32:48Z

@jackwish @FrozenGene @apivovarov @zhiics @shoubhik

Ping to review please. This has been parked for sometime. I think this will benefit users who want to read pre-quantized models.

mshawcroft · 2019-10-01T11:29:32Z

This patch is a big step forward, the latest version works for a number of networks so would be good to get it merged sooner rather than later.

There are a few common failure cases that we see, not sure if these are expected or unexpected at the moment:

TVMError: Check failed: idx < data_.size() && data_[idx].second != 0: Attribute TOpPattern has not been registered for Operator qnn.requantize

TVMError: Check failed: idx < data_.size() && data_[idx].second != 0: Attribute TOpPattern has not been registered for Operator qnn.dequantize

FrozenGene · 2019-10-01T15:24:47Z

@anijain2305 I am enjoying National Day holiday, plan to come back Oct 11. Would you mind waiting for my review after that? Thanks.

FrozenGene · 2019-10-01T15:26:36Z

@mshawcroft your issue should not be related with this pr. But it should be resolved. I think register for injective is enough.

anijain2305 · 2019-10-01T16:24:27Z

TVMError: Check failed: idx < data_.size() && data_[idx].second != 0: Attribute TOpPattern has not been registered for Operator qnn.requantize

TVMError: Check failed: idx < data_.size() && data_[idx].second != 0: Attribute TOpPattern has not been registered for Operator qnn.dequantize

Thanks for checking it for many networks. You are seeing this error because you have to call

mod = tvm.relay.qnn.transform.CanonicalizeOps()(mod)

See the changes made in this PR in the file - tests/python/frontend/tflite/test_forward.py

Pre-quantized models go through a Relay dialect called QNN. There are a couple of QNN passes that convert all QNN ops to a sequence of already existing Relay ops. Currently, these QNN passes are not a part of relay.build module and have to be called explicitly. We are working on simplifying this in another PR (#3971). Both of the PRs have been sitting idle for some time. Sorry for the inconvenience. We will try to get them in soon.

anijain2305 · 2019-10-03T06:44:05Z

Given #3971 got merged, this PR is the only missing piece for reading pre-quantized TFLite models.

Reviews will be greatly appreciated - @jackwish @apivovarov @zhiics @shoubhik @tqchen @yzhliu @u99127

u99127

Otherwise LGTM.

u99127 · 2019-10-01T21:15:38Z

python/tvm/relay/frontend/tflite.py

+            assert len(output_tensors) == 1, "output tensors should be 1"
+            output_tensor = output_tensors[0]
+            assert self.has_same_qnn_params(input_tensor, output_tensor), \
+                    "qnn.op.reshape requires input and output qnn params to be same"


If this is a requirement of tflite quantized outputs, then the error message should be :

"tflite.reshape requires input and output scale and zero_points to be the same"

else:

should this not be NotImplementedError(...) ?

Thanks. Will update the message.

u99127 · 2019-10-01T21:17:04Z

python/tvm/relay/frontend/tflite.py

+                out = _op.nn.avg_pool2d(in_expr, **params)
+            else:
+                assert self.has_same_qnn_params(input_tensor, output_tensor), \
+                        "qnn.op.avg_pool2d requires input and output qnn params to be same"


Same comment as the other one ?

Yes, I will change the message.

FrozenGene · 2019-10-12T06:18:55Z

python/tvm/relay/frontend/tflite.py

@@ -153,13 +155,24 @@ def get_tensors(self, tensors_idx_list):
        return_list = list()
        for tensor_idx in tensors_idx_list:
            if tensor_idx < 0:
-                return_list.append(TensorWrapper(tensor_idx, 0, 0))
+                return_list.append(TensorWrapper(tensor_idx, 0, 0, None))


The default value is None. we don't need to change this line of code.

FrozenGene · 2019-10-12T06:21:23Z

python/tvm/relay/frontend/tflite.py

+                if qnn_params['scale'] == 0 and qnn_params['zero_point'] == 0:
+                    qnn_params = None
+            return_list.append(TensorWrapper(tensor_idx, tensor, buffer, qnn_params))


Better way is to check if qnn_params['scale'] != 0 or qnn_params['zero_point'] != 0:, because the default value of qnn_params is None in line 166.

FrozenGene · 2019-10-12T06:24:14Z

python/tvm/relay/frontend/tflite.py

+    def has_same_qnn_params(self, tensor_a, tensor_b):
+        return tensor_a.qnn_params['scale'] == tensor_b.qnn_params['scale'] and \
+                tensor_a.qnn_params['zero_point'] == tensor_b.qnn_params['zero_point']


Minor code style suggestion. tensor_a be lhs_tensor and tensor_b be rhs_tensor like we did in convert_elemwise.

FrozenGene · 2019-10-12T06:24:49Z

python/tvm/relay/frontend/tflite.py


+        # If the tensors are quantized, ensure that input/output qnn params are same.
+        if input_tensor.qnn_params is not None:


Could be simplified if input_tensor.qnn_params:

FrozenGene · 2019-10-12T06:25:18Z

python/tvm/relay/frontend/tflite.py

+        # If the tensors are quantized, ensure that input/output qnn params are same.
+        if input_tensor.qnn_params is not None:
+            output_tensors = self.get_output_tensors(op)
+            assert len(output_tensors) == 1, "output tensors should be 1"


"output tensors should be 1" -> "The length of output tensors should be 1"

FrozenGene · 2019-10-12T06:28:16Z

python/tvm/relay/frontend/tflite.py


+        # Softmax does not go well with Int8. So Dequantize back to FP32.


Better leave one TODO. Softmax's int8 computation is complex but we could leave it to implement when we have spare time. The same situation suits for like sigmoid / mean operators.

FrozenGene · 2019-10-12T06:30:50Z

python/tvm/relay/frontend/tflite.py

@@ -738,7 +819,15 @@ def convert_conv(self, op, conv_type):
            raise tvm.error.OpAttributeUnImplemented(
                'Padding format {} is not supported for operator Conv.'.format(padding))

-        out = _op.nn.conv2d(data=in_expr, weight=weight_expr, **params)
+        # Check if the inputs are quantized. If yes, call qnn conv2d.
+        if input_tensor.qnn_params is None:


Same comment before.

FrozenGene · 2019-10-12T06:31:04Z

python/tvm/relay/frontend/tflite.py

+                    .format('qnn.op.dense'))
+
+        # Finally if the dense is quantized. Add a requantize at the end.
+        if output_tensor.qnn_params is not None:


Same comment before.

FrozenGene · 2019-10-12T06:31:12Z

python/tvm/relay/frontend/tflite.py

@@ -581,7 +640,23 @@ def convert_fully_connected(self, op):

        # If we have fused activations
        if fused_activation_fn != ActivationFunctionType.NONE:
-            out = self.convert_fused_activation_function(out, fused_activation_fn)
+            if output_tensor.qnn_params is None:


Same comment before.

FrozenGene · 2019-10-12T06:32:55Z

tests/python/frontend/tflite/test_forward.py

@@ -1029,3 +1048,6 @@ def test_forward_ssd_mobilenet_v1():
    test_forward_inception_v3_net()
    test_forward_inception_v4_net()
    test_forward_ssd_mobilenet_v1()
+
+    # End to End quantized
+    test_forward_qnn_inception_v1_net()


Let us add Mobilenet V2 quant model if this PR could run it. Because it is very popular and often compared across different framework.

Makes sense. Till now I have been focussing on the server side, and thus I used Inception. Can this PR go in first and I can send a PR for mobilenet V2 separately?

I have no problem. However, I have one question that do we have any TODO for Mobilenet V2 quant model? From my knowledge, we should work well on Mobilenet V2 quant model using this PR and just add test code of Mobilenet V2 quant model.

I don't think there is. I want to do a full 50k image test before adding MobileNetV2. I have done that for InceptionV1 and ensured that its accuracy is same as that provided in TFLite info. I will setup the pre-processing for mobilenet next week, and check end-to-end accuracy, and send a separate PR. What do you say?

I think we don't need to do it you mention. Because we are parsing TFLite quant model and TFLite framework is one baseline. We could treat TFLite quant model like TFLite FP32 model and compare the result between us and TFLite framework. They are no difference. I think we only need to do end-to-end accuracy when we do quantization in TVM inside or we have strong reason to implement ops but don't keep compatibility with TFLite framework.

Added MobileNet V1. MobilenetV2 fails with a segfault in LLVM. Will debug.

What is the detail information of LLVM report? Try the latest?

This is the whole stack trace. Happens while the graph runtime create. Relay build has already passed.

0x00007fff78dd1e64 in llvm::EVT::getExtendedVectorNumElements() const () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so (gdb) bt #0 0x00007fff78dd1e64 in llvm::EVT::getExtendedVectorNumElements() const () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #1 0x00007fff78ac1090 in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #2 0x00007fff78ac0318 in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #3 0x00007fff78abe55f in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #4 0x00007fff789c7b65 in (anonymous namespace)::DAGCombiner::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #5 0x00007fff789a4339 in (anonymous namespace)::DAGCombiner::visit(llvm::SDNode*) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #6 0x00007fff7897670c in (anonymous namespace)::DAGCombiner::combine(llvm::SDNode*) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #7 0x00007fff78975cd3 in llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOpt::Level) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #8 0x00007fff78aaa7b2 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #9 0x00007fff78aa9939 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #10 0x00007fff78aa6c06 in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #11 0x00007fff787c2ffe in (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #12 0x00007fff78cb0ff4 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #13 0x00007fff7976c4ba in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #14 0x00007fff7976c843 in llvm::FPPassManager::runOnModule(llvm::Module&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #15 0x00007fff7976cdbf in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #16 0x00007fff78bdee38 in llvm::MCJIT::emitObject(llvm::Module*) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #17 0x00007fff78bdf094 in llvm::MCJIT::generateCodeForModule(llvm::Module*) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #18 0x00007fff78be038e in llvm::MCJIT::findSymbol(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #19 0x00007fff78bdfe58 in llvm::MCJIT::getSymbolAddress(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #20 0x00007fff78be04ba in llvm::MCJIT::getGlobalValueAddress(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #21 0x00007fff77dcb794 in tvm::codegen::LLVMModuleNode::LazyInitJIT() () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #22 0x00007fff77dca02e in tvm::codegen::LLVMModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<tvm::runtime::ModuleNode> const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #23 0x00007fff77e0db68 in tvm::runtime::GraphRuntime::CreateTVMOp(tvm::runtime::TVMOpParam const&, std::vector<DLTensor, std::allocator<DLTensor> > const&, unsigned long) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #24 0x00007fff77e0afe4 in tvm::runtime::GraphRuntime::SetupOpExecs() () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #25 0x00007fff77e09aad in tvm::runtime::GraphRuntime::Init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module, std::vector<DLContext, std::allocator<DLContext> > const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #26 0x00007fff77e0f10b in tvm::runtime::GraphRuntimeCreate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module const&, std::vector<DLContext, std::allocator<DLContext> > const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #27 0x00007fff77e10f8b in std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::$_12>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #28 0x00007fff77de5577 in TVMFuncCall () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so #29 0x00007fff7e54ee20 in ffi_call_unix64 () from /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so #30 0x00007fff7e54e88b in ffi_call () from /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so #31 0x00007fff7e54901a in _ctypes_callproc () from /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so #32 0x00007fff7e53cfcb in ?? () from /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so #33 0x00000000005c3bd7 in PyObject_Call () #34 0x00000000005354af in PyEval_EvalFrameEx () #35 0x000000000053a81b in PyEval_EvalCodeEx () #36 0x00000000004e3423 in ?? () #37 0x00000000005c3bd7 in PyObject_Call () #38 0x00000000004f08be in ?? () #39 0x00000000005c3bd7 in PyObject_Call ()

anijain2305 · 2019-10-12T07:37:26Z

@FrozenGene Thanks! Can you please review again? I incorporated your comments. I would suggest to have a separate PR for mobilenet. I can take care of that in next couple of weeks.

zhiics

LGTM. For MobilenetV2, I think we can send a separate PR because it looks to me that it wouldn't change structure of this PR (much). @FrozenGene how do you think?

zhiics · 2019-10-13T04:16:09Z

tests/python/frontend/tflite/test_forward.py

+
+def test_forward_qnn_mobilenet_v1_net():
+    """Test the Quantized TFLite Mobilenet V1 model."""
+    # MobilenetV2


s/MobilenetV2/MobilenetV1

Add a todo for MobilenetV2?

FrozenGene · 2019-10-13T04:18:15Z

tests/python/frontend/tflite/test_forward.py

+
+def test_forward_qnn_mobilenet_v1_net():
+    """Test the Quantized TFLite Mobilenet V1 model."""
+    # MobilenetV2


MobilebetV1

FrozenGene · 2019-10-13T04:21:44Z

tests/python/frontend/tflite/test_forward.py

+    tvm_output = run_tvm_graph(tflite_model_buf, data, 'input')
+    tvm_predictions = np.squeeze(tvm_output)
+    tvm_sorted_labels = tvm_predictions.argsort()[-3:][::-1]
+    tvm.testing.assert_allclose(tvm_sorted_labels, tflite_sorted_labels)


Could you explain more why we can not compare tflite_out with tvm_out directly like we do in FP32? I think we could get the same result if we have kept the compatibility with TFLite, if not, why we don’t keep the compatibility?

The only difference between TFLite and QNN is rounding. TFLite reference implementation is reverse-engineered from ARM assembly instructions (which perform 2 roundings in general). QNN follows more formal rounding described here. If you are interested, I would encourage to read the inlined comments in the code - https://github.com/dmlc/tvm/blob/068c148f01fcc81c72004512de320ca1cee24dc6/src/relay/qnn/util.cc#L79-L133

There was also a fair amount of discussion which has been summarized here - #3591 (comment). This should also be helpful.

This rounding can make small variations in the output, because of which we can not do exact check.

I have measured accuracy on TFLite quantized Inception V1-V3 on 50K images using QNN, and observed accuracy parity with TFLite accuracy native execution.

Thanks for the information! @anijain2305 However, if I remember correctly, when I review the code of requantize, I mention that we could use round_away_from_zero, which is used for TFLite. Do we change it later?

Alright, if we change the default value later, In my opinion, I think we should pass round_away_from_zero to requantize operator for TFLite model frontend parsing.The reasons are:

Keep the compatibility with TFLite should be the first priority. Because we are parsing TFLite model, the quantization accuracy is controlled by Tensorflow quantization-aware training and TFLite.

When we run new quantization model of TFLite, we can not get test suites all the time. For example, the customers / algo. team sends us the TFLite model and requires us boost the performance. We should make sure the result is the same as TFLite. The simplest way is to compare the result with TFLite, not do end-to-end accuracy, sometimes we even can not do it.

For our development, we could also do the simple compare with TFLite like FP32, not do end-to-end accuracy. If one op need to requantize internally, we could compare it with TFLite, not need to think about the result is right or not using math when we have different result.

TFlite rounding is not exactly formal round_away_from_zero. You might already know but these are the two functions that are involved in TFLite rounding

using gemmlowp::RoundingDivideByPOT; using gemmlowp::SaturatingRoundingDoublingHighMul;

More at - https://github.com/tensorflow/tensorflow/blob/ff91cd691027076e6128afbd902d3d38e3672787/tensorflow/lite/kernels/internal/common.h#L105

The key idea is that the above two functions map to ARM assembly instructions. But, in this process, they perform approximation that make it little different from formal GNU C rounding. The reasoning behind doing that should be to get performance.

In QNN, we decided to follow GNU C rounding, as thats more formal and can work across more frameworks (like MxNet, PyTorch and not just TFLite).

However, if one wants, one can implement a TFLite rounding. The code is modular enough to add one more rounding option. I think you have good comfortability with TFLite, so I would encourage you to write TFLite rounding if you get time :)

I fully understand your point across more frameworks. My previous point is to emphasize we could provide one option for our TFLite frontend parser so that we could keep the compatibility with TFLite, we don't need to change any default value in our existing code, which could be used for MXNet, PyTorch and so on. Otherwise, the users will confuse why our result is not the same as TFLite when they use TVM. Keep the compatibility with TFLite should be the first priority when we parse TFLite model. So I think we shouldn't leave this to implement by TVM users.

However, as you said, it doesn't affect the code structure. If you agree the point of compatibility point, I think we could make this PR in firstly and open one issue reminding us we should do this thing. If you don't have spare time, I could help to manage it next.

Yes, we can open an issue to track the TFLite rounding. It would not affect code structure. We can set the value of the rounding in TFLite parser, hiding it from the TVM users.

OK. Could you help to open this this to track it? And you could assign this task to me.

anijain2305 · 2019-10-13T20:07:01Z

@u99127 I have incorporated your comments. Please review again. Also please approve explicitly, it helps committers :)

zhenhuaw-me

LGTM generally, some code style comments and model specific concerns only. Thank you for the ping, and sorry for the delayed feedback. I was on holiday and busy with some other projects.

One thing I'd like to comment is that, as the QNN operators implementation is different from TFLite (I mean not trying to emulate float point with integer in operators like quantize/dequantize/requantize), making QNN dialect output elementwisely equals to TFLite in bits seems not a must (TFLite becomes a reference but not a mirror) - not the target of these QNN operators. The evaluation of the design should be the model accuracy I think.

zhenhuaw-me · 2019-10-14T06:36:30Z

python/tvm/relay/frontend/tflite.py

+                                         input_scale=input_tensor.qnn_params['scale'],
+                                         input_zero_point=input_tensor.qnn_params['zero_point'])
+
+        out = _op.nn.softmax(in_expr, **params)


this seems generates float output for softmax, would it be better if we quantize it back to uint8 since the semantic of a uint8 model requires an uint8 output.

That would indeed be better. Doing that.

zhenhuaw-me · 2019-10-14T06:39:37Z

python/tvm/relay/frontend/tflite.py

+                raise tvm.error.OpNotImplemented(
+                    'Operator {} with fused activation is not supported yet.'
+                    .format('qnn.op.concatenate'))


not sure if we can have an clamp here to support it...

We also need to handle different types of activations like Relu/Relu6. I have a draft implementation of activation handling. I will send a separate PR for that once I test it with few cases.

zhenhuaw-me · 2019-10-14T06:42:17Z

python/tvm/relay/frontend/tflite.py

@@ -584,7 +639,14 @@ def convert_fully_connected(self, op):
        weight_value = self.get_tensor_value(weight_tensor)
        weight_expr = self.exp_tab.new_const(weight_value, dtype=weight_tensor_type_str)

-        out = _op.nn.dense(in_expr, weight_expr)
+        # Check if the inputs are quantized. If yes, call qnn dense.


pretty easy-to-read logic making the comments not needed to me :)

zhenhuaw-me · 2019-10-14T06:44:11Z

python/tvm/relay/frontend/tflite.py

+        if not input_tensor.qnn_params:
+            out = _op.nn.dense(in_expr, weight_expr)
+        else:
+            out = _qnn.op.dense(in_expr, weight_expr,
+                                input_zero_point=input_tensor.qnn_params['zero_point'],
+                                kernel_zero_point=weight_tensor.qnn_params['zero_point'],
+                                out_dtype='int32')


generally, i suggest to write such logic as below because the human are more easy to understand not logic to me...

if input_tensor.qnn_params: # path 1 else: # path 2

You are right. Readability is very important. Will change the style.

zhenhuaw-me · 2019-10-14T06:51:03Z

python/tvm/relay/frontend/tflite.py

+                raise tvm.error.OpNotImplemented(
+                    'Operator {} with fused activation is not supported yet.'
+                    .format('qnn.op.conv2d'))


That's interesting, AFAIK, TFLite model converter always fuses the ReLU family activations to Conv/FC operators. The activation of Inception model you are using is not fused?

Yeah, TFLite hosted models do not have any fused activation. There is a fused activation in Object Detection models, but we are far away from supporting object detection (basically dependent on control flow support).

zhenhuaw-me · 2019-10-14T06:52:29Z

python/tvm/relay/frontend/tflite.py

+        if output_tensor.qnn_params:
+            input_scale = input_tensor.qnn_params['scale'] * weight_tensor.qnn_params['scale']
+            input_zero_point = 0
+            out = _qnn.op.requantize(out,
+                                     input_scale=input_scale,
+                                     input_zero_point=input_zero_point,
+                                     output_scale=output_tensor.qnn_params['scale'],
+                                     output_zero_point=output_tensor.qnn_params['zero_point'],
+                                     out_dtype=output_tensor_type_str)


Maybe we can have a small function, requantize_output(out, output_tensor, input_tensor) to wrapper this :)

Good suggestion. But, I will skip this one for now. Currently, there are only 2 uses of requantize. Both of which are quite specific - where we have to multiply the input scales to get the new input scale. As we add more operators, and thus more requantize, I will revisit this.

zhenhuaw-me · 2019-10-14T06:54:07Z

python/tvm/relay/frontend/tflite.py

+                assert self.has_same_qnn_params(input_tensor, output_tensor), \
+                        "TFLite avg_pool2dreshape requires input and output scale" \
+                        "and zero points to be equal"
+                out = _op.cast(in_expr, dtype="int32")


style consistency of the "/' :)

zhiics · 2019-10-14T19:34:01Z

Thanks @anijain2305 @jackwish @FrozenGene @mshawcroft @u99127 for the hard work on this PR. Since we now have approvals from several organizations and the parser is in a good shape, I am going to merge it.

@anijain2305 please be aware of some further work on it, like MobileNetV2 and activation handling.

anijain2305 · 2019-10-14T19:37:09Z

Thanks @zhiics

Yes, I will follow up on both MobilenetV2 and activation handing. Will open a discuss forum to address @FrozenGene concerns regarding TFLite output exact match as well.

anijain2305 force-pushed the tflite_parser branch 3 times, most recently from 0f75650 to a0e7be2 Compare September 5, 2019 21:31

anijain2305 commented Sep 5, 2019

View reviewed changes

anijain2305 marked this pull request as ready for review September 5, 2019 21:39

shoubhik reviewed Sep 9, 2019

View reviewed changes

anijain2305 force-pushed the tflite_parser branch from a0e7be2 to dc7b5b8 Compare September 19, 2019 00:26

anijain2305 force-pushed the tflite_parser branch 4 times, most recently from 1e47074 to ab3b78d Compare September 28, 2019 01:22

mshawcroft reviewed Sep 30, 2019

View reviewed changes

python/tvm/relay/frontend/tflite.py Show resolved Hide resolved

anijain2305 force-pushed the tflite_parser branch from ab3b78d to 6ef06b3 Compare September 30, 2019 17:29

anijain2305 force-pushed the tflite_parser branch from 6ef06b3 to ad29bc6 Compare October 3, 2019 06:32

u99127 suggested changes Oct 3, 2019

View reviewed changes

FrozenGene reviewed Oct 12, 2019

View reviewed changes

anijain2305 force-pushed the tflite_parser branch from ad29bc6 to 9703fd5 Compare October 12, 2019 07:24

anijain2305 force-pushed the tflite_parser branch from 9703fd5 to 9fcd02f Compare October 12, 2019 23:32

zhiics approved these changes Oct 13, 2019

View reviewed changes

FrozenGene reviewed Oct 13, 2019

View reviewed changes

anijain2305 force-pushed the tflite_parser branch from 9fcd02f to b6be6bb Compare October 13, 2019 08:11

zhenhuaw-me approved these changes Oct 14, 2019

View reviewed changes

[QNN][TFLite] Parsing TFLite quantized models.

133e49d

anijain2305 force-pushed the tflite_parser branch from b6be6bb to 133e49d Compare October 14, 2019 16:42

zhiics merged commit bfb811c into apache:master Oct 14, 2019

anijain2305 mentioned this pull request Oct 17, 2019

[QNN][TFLite] Parsing QNN Add op. Adding MobilenetV2. #4142

Merged

anijain2305 added a commit to anijain2305/tvm that referenced this pull request Oct 17, 2019

[QNN][TFLite] Parsing TFLite quantized models. (apache#3900)

92a9339

wweic pushed a commit to neo-ai/tvm that referenced this pull request Oct 18, 2019

[QNN][TFLite] Parsing TFLite quantized models. (apache#3900)

b54230c

FrozenGene mentioned this pull request Nov 8, 2019

Add test for the qnn_add operator #4282

Merged

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

anijain2305 deleted the tflite_parser branch November 13, 2019 00:31

FrozenGene mentioned this pull request Jan 14, 2020

[FRONTEND][TFLITE] Add support for TFLite_Detection_PostProcess #4543

Merged


		# If the tensors are quantized, ensure that input/output qnn params are same.
		if input_tensor.qnn_params is not None:


		# Softmax does not go well with Int8. So Dequantize back to FP32.

[QNN][TFLite] Parsing TFLite quantized models. #3900

[QNN][TFLite] Parsing TFLite quantized models. #3900

Conversation

anijain2305 commented Sep 5, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mshawcroft commented Sep 30, 2019

anijain2305 commented Sep 30, 2019

mshawcroft commented Oct 1, 2019

FrozenGene commented Oct 1, 2019

FrozenGene commented Oct 1, 2019

anijain2305 commented Oct 1, 2019 • edited Loading

anijain2305 commented Oct 3, 2019

u99127 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anijain2305 Oct 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anijain2305 commented Oct 12, 2019 • edited Loading

zhiics left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anijain2305 Oct 13, 2019 • edited Loading

Choose a reason for hiding this comment

FrozenGene Oct 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anijain2305 commented Oct 13, 2019

zhenhuaw-me left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiics commented Oct 14, 2019

anijain2305 commented Oct 14, 2019

anijain2305 commented Sep 5, 2019 •

edited

Loading

anijain2305 commented Oct 1, 2019 •

edited

Loading

anijain2305 Oct 12, 2019 •

edited

Loading

anijain2305 commented Oct 12, 2019 •

edited

Loading

zhiics left a comment •

edited

Loading

anijain2305 Oct 13, 2019 •

edited

Loading

FrozenGene Oct 14, 2019 •

edited

Loading